Segmentation of prosodic phrases for improving the naturalness of synthesized Mandarin Chinese speech
نویسندگان
چکیده
It is noticed that i n natural speech sentences are breaked into breath groups. Some words seem to be more closely grouped with adjacent words: we call these groups prosodic phrases. In order to improve the naturalness of synthesized speech, prosodic processing in both text-processing component and speech generation component is needed. The text-processing component is more important because the performance of speech generation component is dependent on the ability of the previous one. This paper discussed how to break sentences into prosodic phrases. At first, for segmentation of prosodic phrases, the text is segmented into Chinese words. Then these words are annotated with an automatic Part-of-Speech tagger. Adjacent words which have close syntactic relation are grouped to form prosodic phrases using the POS tags and syntactic phrase structure information. When breaking prosodic phrases other factors must be taken into consideration, such as speech velocity, pragmatic knowledge, the context, and the speaker's feeling. The POS tagging algorithm is based on integration of the statistical method and rule method.2-Gram Markov language model is used in the algorithm. The most likely POS sequence for a given sentence is found by searching through the language model and picking the most likely path. Then the rule method is used to correct the errors caused by statistical method, which identifies a word's category using context information. Through experiments the tagger correctly tagged 94% of words in an independent test set of 1.2 thousand Chinese characters. Based on rules, the lexical information and phrase structure information will be used to form prosodic phrases. Through experiments we obtained a break-correct figure of 86% and a recall rate of 90%. After segmentation of prosodic phrases, these grouped words are read continuously when the text is converted to speech. And the naturalness of synthesized speech is improved.
منابع مشابه
Prosody generation in Chinese synthesis using the template of quantified prosodic unit and base intonation contour
This paper presents a prosody generation method for Chinese mandarin using the template of quantified prosodic unit and base intonation contour. This method uses the prosodic feature picked-up from the syllables in the prosody words by rule as the base unit, and integrates the prosody rules in the prosody words of Chinese mandarin and base intonation contour to achieve the prosody contours with...
متن کاملProsodic Alternative Units in a Mandarin Chinese Speech Synthesizer
The Mandarin Chinese synthesis component of the Dresden Speech Synthesizer DreSS is based on an inventory of syllabic units. The inventory contains all Chinese syllables with the possible tones in up to three phonetic variations for a correct modeling of the cross syllable coarticulation effects. In order to improve the naturalness and fluency of the synthesized speech, the inventory was comple...
متن کاملMulti-strategy data mining on Mandarin prosodic patterns
Mandarin prosodic models are very important in speech research and synthesis, which mainly describes the variation of pitch. The models that are now being used in most Chinese Text-To-Speech systems are constructed by expert, qualitatively and with low precision. In this paper, we propose a Multi-strategy Data Mining framework to extract prosodic patterns from actual large Mandarin speech datab...
متن کاملProsodic Boundary Prediction Based on Maximum Entropy Model with Error-Driven Modification1
Prosodic boundary prediction is the key to improving the intelligibility and naturalness of synthetic speech for a TTS system. This paper investigated the problem of automatic segmentation of prosodic word and prosodic phrase, which are two fundamental layers in the hierarchical prosodic structure of Mandarin Chinese. Maximum Entropy (ME) Model was used at the front end for both prosodic word a...
متن کاملProsodic Boundary Prediction Based on Maximum Entropy Model with Error-Driven Modification
Prosodic boundary prediction is the key to improving the intelligibility and naturalness of synthetic speech for a TTS system. This paper investigated the problem of automatic segmentation of prosodic word and prosodic phrase, which are two fundamental layers in the hierarchical prosodic structure of Mandarin Chinese. Maximum Entropy (ME) Model was used at the front end for both prosodic word a...
متن کامل